Extraction Programs: A Unified Approach to Translation Rule Extraction
نویسندگان
چکیده
We provide a general algorithmic schema for translation rule extraction and show that several popular extraction methods (including phrase pair extraction, hierarchical phrase pair extraction, and GHKM extraction) can be viewed as specific instances of this schema. This work is primarily intended as a survey of the dominant extraction paradigms, in which we make explicit the close relationship between these approaches, and establish a language for future hybridizations. This facilitates a generic and extensible implementation of alignment-based extraction methods.
منابع مشابه
Reducing SMT Rule Table with Monolingual Key Phrase
This paper presents an effective approach to discard most entries of the rule table for statistical machine translation. The rule table is filtered by monolingual key phrases, which are extracted from source text using a technique based on term extraction. Experiments show that 78% of the rule table is reduced without worsening translation performance. In most cases, our approach results in mea...
متن کاملCompact Rule Extraction for Hierarchical Phrase-based Translation
This paper introduces two novel approaches for extracting compact grammars for hierarchical phrase-based translation. The first is a combinatorial optimization approach and the second is a Bayesian model over Hiero grammars using Variational Bayes for inference. In contrast to the conventional Hiero (Chiang, 2007) rule extraction algorithm , our methods extract compact models reducing model siz...
متن کاملLearning Better Rule Extraction with Translation Span Alignment
This paper presents an unsupervised approach to learning translation span alignments from parallel data that improves syntactic rule extraction by deleting spurious word alignment links and adding new valuable links based on bilingual translation span correspondences. Experiments on Chinese-English translation demonstrate improvements over standard methods for tree-to-string and tree-to-tree tr...
متن کاملHybrid Domain Adaptation for a Rule Based MT System
This study presents several experiments to show the power of domain-specific adaptation by means of hybrid terminology extraction mechanisms and the subsequent terminology integration into a rule based machine translation (RBMT) system, thus avoiding cumbersome human lexicon and grammar customization. Detailed evaluation reveals the great potential of this approach: Translation quality can be i...
متن کاملSemantic Roles for String to Tree Machine Translation
We experiment with adding semantic role information to a string-to-tree machine translation system based on the rule extraction procedure of Galley et al. (2004). We compare methods based on augmenting the set of nonterminals by adding semantic role labels, and altering the rule extraction process to produce a separate set of rules for each predicate that encompass its entire predicate-argument...
متن کامل